Skip to content

Added CPU offloading #3452

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
May 20, 2025
Merged

Added CPU offloading #3452

merged 6 commits into from
May 20, 2025

Conversation

cehongwang
Copy link
Collaborator

Description

Added CPU offloading. Compilation takes no more than 1x GPU memory. Before engine compilation, the model and graph module are moved to CPU.

Fixes # (issue)

Type of change

Please delete options that are not relevant and/or add your own.

  • New feature (non-breaking change which adds functionality)

Checklist:

  • My code follows the style guidelines of this project (You can use the linters)
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas and hacks
  • I have made corresponding changes to the documentation
  • I have added tests to verify my fix or my feature
  • New and existing unit tests pass locally with my changes
  • I have added the relevant labels to my PR in so that relevant reviewers are notified

@github-actions github-actions bot added component: conversion Issues re: Conversion stage component: api [Python] Issues re: Python API component: dynamo Issues relating to the `torch.compile` or `torch._dynamo.export` paths labels Mar 26, 2025
@github-actions github-actions bot requested a review from gs-olive March 26, 2025 16:33
@github-actions github-actions bot added the component: tests Issues re: Tests label Mar 31, 2025
@github-actions github-actions bot removed the component: tests Issues re: Tests label Apr 7, 2025
Copy link
Collaborator

@narendasan narendasan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wasnt there supposed to be a bunch of logging?

Copy link
Collaborator

@peri044 peri044 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to test this change across our entire test suite to ensure it is working as expected.

@github-actions github-actions bot added the component: tests Issues re: Tests label May 9, 2025
@@ -690,6 +685,18 @@ def compile(
gm = post_lowering(gm, settings)
logger.debug("Lowered Input graph: " + str(gm.graph))

# Move the weights in the state_dict to CPU
if offload_module_to_cpu:
exported_program.module().to(CPU_DEVICE)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I encountered a situation where this wasn't enough and it required calling torch.cuda.empty_cache and gc.collect as well to release the memory. Here's a suggestion: Modify the delete_module function to deallocate_module(module, delete=False) and call it here as deallocate_module(exported_program.module(), delete=False).

Copy link
Collaborator

@peri044 peri044 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@peri044 peri044 merged commit bb990fd into main May 20, 2025
81 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla signed component: api [Python] Issues re: Python API component: conversion Issues re: Conversion stage component: dynamo Issues relating to the `torch.compile` or `torch._dynamo.export` paths component: tests Issues re: Tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants